Introduction

This analysis explores relationships between indicators across countries. To get first impressions of all the indicators under observation, we refer to the ‘indicators.R’ file in the ‘Scripts’ directory.

For this analysis specifically, we constrain ourselves to indicators such as countries’ percentage of agricultural land, CO2 emissions per capita in megatonnes, and the size of surface area in square kilometers using World Bank data. The exploration is divided into two research questions, namely:

1. Is there a relationship between the percentage of agricultural land and CO2 emissions per capita across countries?

2. Does the size of the surface area of the country play a role?

World Bank Indicators
Variable Indicator Name Definition
AG.LND.AGRI.ZS Agricultural land (% of land area) Agricultural land refers to the share of land area that is arable, under permanent crops, and under permanent pastures. Arable land includes land defined by the FAO as land under temporary crops (double-cropped areas are counted once), temporary meadows for mowing or for pasture, land under market or kitchen gardens, and land temporarily fallow. Land abandoned as a result of shifting cultivation is excluded. Land under permanent crops is land cultivated with crops that occupy the land for long periods and need not be replanted after each harvest, such as cocoa, coffee, and rubber. This category includes land under flowering shrubs, fruit trees, nut trees, and vines, but excludes land under trees grown for wood or timber. Permanent pasture is land used for five or more years for forage, including natural and cultivated crops.
AG.SRF.TOTL.K2 Surface area (sq. km) Surface area is a country’s total area, including areas under inland bodies of water and some coastal waterways.
EN.GHG.CO2.MT.CE.AR5 Carbon dioxide (CO2) emissions (total) excluding LULUCF (Mt CO2e) A measure of annual emissions of carbon dioxide (CO2), one of the six Kyoto greenhouse gases (GHG), from the building sector (subsector of the energy sector) including IPCC 2006 codes 1.A.4 Residential and other sectors, 1.A.5 Non-Specified. The measure is standardized to carbon dioxide equivalent values using the Global Warming Potential (GWP) factors of IPCC’s 5th Assessment Report (AR5).

(Sources: https://data.worldbank.org/indicator/AG.LND.AGRI.ZS?view=chart, https://data.worldbank.org/indicator/AG.SRF.TOTL.K2?view=chart, https://data.worldbank.org/indicator/EN.GHG.CO2.BU.MT.CE.AR5?view=chart)

The dataset we have been provided with contains longitudinal observations of 25 countries with 18 indicators each. The yearly data acquisition happened from 2000 to 2021.

Agenda

1.) Percentage of agricultural land and CO2 emissions per capita

1.1.) Heat map of CO2 emissions

1.2.) Boxplot of CO2 emissions

1.3.) Point-line plot of agricultural land with faceted countries

1.4.) Boxplot of agricultural land

1.5.) Scatter plot of interested variables

1.6.) Point-line plot of CO2 emissions with faceted countries and color scale

1.7.) Point-line plot of interested variables with faceted countries; normalized

2.) Role of surface area in previous relationship

2.1.) Point-line plot of surface area with faceted countries

2.2.) Bar plot of absolute changes with faceted countries; changing countries

2.3.) Point-line plot of relative changes with faceted countries; changing countries

2.4.) Boxplot of surface area

2.5.) Scatter plot of interested variables with color scale and with faceted grouping

2.6.) Point-line plot of CO2 emissions with faceted grouping and color scale

2.7.) Point-line plot of interested variables with faceted grouping; normalized

1. Percentage of agricultural land and CO2 emissions per capita

We analyze how the percentage of agricultural land relates to the CO2 emissions per capita. To get an overview over the interested data and be able to evaluate future insights correctly, we start by looking at the two indicators separately.

1.1. Heat map of CO2 emissions

Starting with the distribution of the CO2 emissions in megatonnes for each country over the observed time frame, we get the following information.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.246     8.782    50.328   672.243   232.015 12717.655

It appears that occasionally there are immense differences within countries’ CO2 emissions from one year to another, displayed by clear jumps in the sequential color scale. After further investigation of this phenomenon inside the database and consultation with our supervisors, we came to the conclusion, that the false values originate from database-caused mishandle during the data set’s download. This mishandlement explains the arbitrary distribution of single entries being valued less by the factor ten - in some cases even by the factor 100. Moving forward, we accept these anomalies and handle them as the error-produced outliers as they are, keeping it in mind and taking future insights with a grain of salt.

1.2. Boxplot of CO2 emissions

## Streuung zwischen den Ländern:  2787843

The CO2 emissions have high variance within the countries. Simultaneously, there are enormous differences in absolute amounts between the countries. Therefore, the greatest challenge may lie in comparing the different countries’ values and trends although the data is provided on a per capita basis. For future comparisons of the two interested variables, we might switch to a logarithmic display of the CO2 emissions, to better visualize the span of countries’ deviations.

1.3. Point-line plot of agricultural land with faceted countries

Furthermore, the distribution of the percentage of agricultural land delivers the following information.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.564  29.805  43.411  42.040  56.129  80.439

1.4. Boxplot of agricultural land

## Streuung zwischen den Ländern:  437.7944

In contrast to the CO2 emissions, the percentages of agricultural land have rather low variance within the countries. However, there are recognizable deviations between the countries, spanning from only five to up to eighty percent. As we operate on a capped percentage scale though, comparisons should be possible quite well.

Moving on, we want to bring those two variables back together. For this purpose, analyzing the distribution of the collected data while disregarding the country-specific origin gives us the following cloud of data points. Note, that the CO2 emissions are now displayed logarithmic to counter the expansive value disparity in the data.

1.5. Scatter plot of interested variables

We recognize a slightly positive linear relationship between the variables, meaning countries with higher percentage of agricultural land account, on average, for more megatonnes of CO2 emissions per capita, while countries with less percentage agricultural land account for fewer CO2 emissions. However, the development over time and the country-specification of observations are completely ignored. In order to take those factors back into consideration, we first distinguish among the countries by faceting our visualization for an in-depth comparison of the indicators for each country over time.

1.6. Point-line plot of CO2 emissions with faceted countries and color scale


The chosen format of the scatter plot instead of line visualization attempts to counter the false entries for CO2 emissions, as for lines the error-caused outliers are displayed in a more extreme way leading us to scatters instead (as well as the unknown development between the data entries, which hinders us of doing linear assumptions).

However, the in ascending average percentage of agricultural land sorted facets show no obvious connection between the two indicators, as the CO2 emissions are developing quite arbitrarily regardless of the associated percentage of agricultural land.

1.7. Point-line plot of interested variables with faceted countries; normalized

To dig even further, we now adjust the data by normalizing the CO2 emissions as well as the percentage of agricultural land within each country, letting us investigate relative changes on the same scale for both indicators and comparing without constraints caused by different value dimensions.


We recognize differing developments between the two indicators for many of the countries. Aruba shows no agricultural land data points, as there is no change in its percentage over the years leading to the not-computable (min; max)-normalization. For the other countries there is no obvious pattern, which the developments of the two indicators seem to follow, leading us to the next step, which is the introduction of the countries’ surface area.

2. Role of surface area in previous relationship

One further aspect that might change the recorded relationship now brings the introduction of another variable to take into account, namely the countries’ surface area. The definition of the surface area in the dictionary at the beginning shows the importance of the clear distinguishment between a countries surface area and land area. While the percentage of agricultural land describes the percentage of agriculturally used area as percentage of the land area, the surface area refers to the countries complete area, even including areas under inland bodies of water and some coastal waterways. Therefore, a direct comparison of the two indicators is not possible without keeping the difference in mind.

2.1. Point-line plot of surface area with faceted countries

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      180   243610   796100  2461119  1285220 17098250
## Anzahl der Länder ohne Veränderungen:  10

There are several countries with no changes in surface area throughout the interested time span at all. Therefore, before heading forward, we first want to zoom in a little closer on those with changes to understand the relevance of those changes.

2.2. Bar plot of absolute changes with faceted countries; changing countries

For the vast majority of the countries, the changes can be classified as under 1,000 square kilometers over the whole time span, which means they are negligible in comparison to the non-changing countries within our data analysis. Similar insights can be derived when looking at the relative changes.

2.3. Point-line plot of relative changes with faceted countries; changing countries

For each country, even those with changes throughout the time span, there are at most marginal changes of two percent in surface area. These, as stated before, are negligible for our analysis, allowing us to assume, that countries do not switch their order with respect to the surface area amount during the time frame. This claim can finally be confirmed by looking at the scatter decomposition.

2.4. Boxplot of surface area

## Streuung zwischen den Ländern:  1.783129e+13

Moreover, we drop the focus on the development over time considering this variable when moving on, shifting the perspective towards whether the average absolute amount of surface area plays any role in the relationship between agricultural land and CO2 emissions for the observed countries.

Interestingly, this allows us to classify the countries into quantiles based on their average values for specific indicators as done during the analysis for all provided indicators, this time performed using the surface area. We group the countries into the following segments:

Quantile Q1 Q2 Q3 Q4 Q5
Surface area Very Low Low Medium High Very High

Applying the newly established grouping to the initial comparison of the percentage of agricultural land and the megatonnes of CO2 emissions per capita done before, shows the follwoing relationships.

2.5. Scatter plot of interested variables with color scale and with faceted grouping

At first glance, the relationship seems to transfer into each of the five groups with weak to medium positive linear relationships, the countries with moderate surface area being the only ones with almost no measurable relationship. For all of the others the connection of countries with higher percentage of agricultural land account, on average, for more megatonnes of CO2 emissions per capita, while countries with less percentage agricultural land account for fewer CO2 emissions seems to apply again.

Further, the distribution of the data points within the facets catches the eye, as the very small and large countries have comparably fewer CO2 emissions, while the very large countries show the - presumably expected - by far highest CO2 emissions among the data set. The role of surface area seems to be quite small, as the relationship stays basically unchanged regardless of the countries’ classification.

2.6. Point-line plot of CO2 emissions with faceted grouping and color scale

The biggest anomalies regarding the CO2 emissions with the percentage of agricultural land in mind seem to be the moderate and very large surface area countries. On one hand, we can detect comparably high percentages in agricultural land for the moderate area countries, but those do not transfer themselves to any obvious differences in the CO2 emissions compared to the other groups, on the contrary, the CO2 emissions are even lower on average than those of the very small and small countries. On the other hand, the very large countries stand out by having the presumably expected highest CO2 emissions among all groups. Marginal differences appear between the intensity of positive development over time, with all groups having slightly increasing trends in CO2 emissions per capita.

To round up this exploration, we want dig deeper by looking at the time-specific distribution within the groups with another twist.

2.7. Point-line plot of interested variables with faceted grouping; normalized

If we finally pivot back to our normalized comparison we did earlier, we can do the same now with our grouped data according to the surface area categories.

We cannot identify any obvious connection between the CO2 emissions per capita and the percentage of agricultural land even with the interested countries categorized by surface area.

Here we notice something fascinating: while the four smallest groups show a partly parallel, partly at least comparable development over the years, the very large countries stand out. Although the percentage of agricultural land has fallen drastically over the years, CO2 emissions have risen regardlessly. One possible explanation could be that for the very large countries in our dataset, the decline in agricultural land may have been accompanied by an increase in urbanization, leading to even greater CO2 emissions than those caused by agriculture.

Whether this really is the case can only be speculated at this point, but with further information on aspects such as urban, forestry or water area, further analyses on this issue are possible and advised.